/ ~A 

TAGOetal Page 1 of 5 
HIRA.0143 




1/5 



r 



TAGOetal Page 2 of 5 
HIRA.0143 



FIG.2 



106 



As seen in the example of text data relating to a gene, there are cases 
where the information symbolizing a particular theme is not a keyword. 
In the example of gene-related data, the ID or base sequence of a 
particular gene is also the information symbolizing a particular theme. In 
this example, even if the text data describes genes with identical base 
sequences, the groups of keywords that appear in that data could be very 
different sets. 
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In a database search system in which databases with a great 
quantity of sets of text data are searched to extract and refer to text 
data that describes a desired theme, the searcher first classifies the 
entire text data into a plurality of groups in an arbitrary manner, 
and a cluster analysis is conducted on each of groups of keywords 
that are entered also in an arbitrary manner, in units of the 
aforementioned text data groups and in accordance with the 
frequency of appearance of the keywords. As a result, even in 
cases where the keywords appearing in the individual items of text 
data are different, it becomes possible to identify text data 
describing a desired theme as groups of text data. 
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